Preamble

This worksheet contains material for an introductory QGIS course held at the Cathie Marsh Institute for Social Research on 26 February 2020. All material is available on GitHub.

Background

The aim of this short course is to get you started using QGIS as a tool for creating, manipulating, exploring and visualising spatial data. As we will see, QGIS is a powerful piece of software for generating aesthetically pleasing maps, but it is also invaluable for conducting analysis and substantive exploratory research. The material we cover today will equip you with the skills necessary to begin visualising and analysing your own data. Once you’re comfortable with the skills introduced in the guided exercises, please feel free to put these into practice using your own data, as per the guidance in the second worksheet.

You can consider today to be somewhat of a ‘crash course’ in QGIS. As such, it’s worthwhile remembering that the materials we cover today form part of a much wider field commonly known as ‘Geographic Information Science’ (GIS). If you are interested in exploring this field more, there are some recommended books in the further reading section of the second worksheet. That said, the information provided in this course will provide you with more than enough information get started exploring spatial data and making maps!

Why use QGIS?

QGIS is an open-source piece of software. This means, for one thing, that it is free, which represents a significant advantage over comparable software like ArcGIS for students and institutions (understandably) unwilling to fork out for licence fees. But that’s just a practical advantage: open-source also means transparency, continuous development and a supportive community of developers and users. QGIS is a key part of a wider, growing movement towards open-source software in geospatial analysis, with tools like GeoDa and GIS functionality in R becoming increasingly popular too. A key benefit of QGIS being open-source is that it is constantly evolving, with lots of smart people continuously contributing to new versions and plug-ins to expand its capability.

QGIS is part of a wider open-soure movement.

QGIS is part of a wider open-soure movement.


You will also find a wealth of documentation and resources online, largely generated by the developers and users themselves. Websites like StackExchange are full of people willing to answer your questions, and I guarantee you that most queries you have will have already been answered somewhere! To date, nearly thirsty thousand questions have asked about QGIS. The friendly online community of QGIS developers and users is possibly the most extensive resource out there!

Spatial data

Before we get to grips with the software itself, let’s cover some preliminary basics, beginning with spatial data types. The diversity of topics in geographical research has motivated the collection of an enormous array of information which can be quantified for use in software like QGIS. Data collected for making maps will inherently include some spatial component, describing the location of an entity in space. It might also incorporate attribute data: non-spatial characteristics which describe entities. There are numerous ways in which spatial data can be stored and used in GIS software, including QGIS, but the most common data types are vector and raster.

Vector data

Vector data represent features in the real world through points, lines and polygons. Standing on top of a skyscraper, overlooking a city, one will observe buildings, parks, street lights and roads, each comprising discrete features of the urban landscape. Vector data is comprised of vertices, which define the geometry of these features. The simplest geometric form is a two-dimensional vertex, a single X (longitude) and Y (latitude) coordinate describing a specific point location. When vertices are connected in order, with different start and end points, a line is formed. Lines with equal start and end points, with at least three vertices, represent polygons. In our urban landscape, points might be used to represent street lights, lines to represent roads, and polygons to represent buildings. A great deal of spatial data defines objects which do not physically exist on the ground, such as electoral wards or neighbourhood boundaries. Through points, lines and polygons we can collectively describe objects in space, and the attributes which describe these objects. Given its popularity in social science research, we will focus on vector data throughout today.

Vector data. Source: [Data Carpentry](https://datacarpentry.org/organization-geospatial/02-intro-vector-data/) via the National Ecological Observatory Network (NEON).

Vector data. Source: Data Carpentry via the National Ecological Observatory Network (NEON).


Raster data

In some circumstances, vector data is unsuitable. Looking down from our skyscraper, one might also observe variation in air pollution across the city. This cannot easily or intuitively be represented using vector geometries such as lines or polygons. Air pollution might vary considerably within streets or parks, and consequently, attribute data associated with lines or polygons would mask a great deal of information. In such circumstances, raster data may be able to represent the real world more accurately than vector data. Rasters are comprised of a regular grid of cells, each of which contain associated attribute data, and can be used to represent continuous spatial information such as air pollution or remote sensing imagery of the Earth’s surface. The most common usage of raster data you might have come across are in meteorological maps, such as those used in weather reports. Data about regionwide precipitation or temperature, for instance, is often stored in raster format. As noted earlier, today we are going to focus on vector data, but if you’d like to explore raster data examples, please feel free to explore the raster-specific resources at the end of this worksheet, or give me a shout!

Projections

Representing earth

As we noted earlier, maps are representations of the real world. Importantly, these representations tend to be created on a flat surface (a computer screen or piece of paper) even though the earth itself is more-or-less spherical. In an attempt to portray spatial entities, whether it be crime locations or any other phenomena, on a flat surface, we perform a transformation known as a ‘projection’. This is quite the mathematical challenge, and can be carried out in countless different ways, each of which have their own advantages and disadvantages. For instance, until recently, Google Maps used a projection known as the Mercator projection, which whilst useful for navigational purposes, also distorts the earth in a manner which makes land masses near the equator, such as Africa, appear much smaller than they actually are, and land masses near the poles, such as Greenland, much larger. For a light-hearted look at different projections of the world map, I recommend this blog post.


Source: [Brilliant Maps](https://brilliantmaps.com/xkcd/).

Source: Brilliant Maps.


Coordinate Reference Systems

When working within GIS software like QGIS, we are subject to the same restrictions, since we are representing real-world information on a flat computer screen. Any spatial information you are using in QGIS, whether it be tram stop locations, neighbourhood boundaries or river formations, must have an associated Coordinate Reference System (CRS). This ensures that we know how our 2D projected maps relate to the actual features on our (pretty much) spherical earth. You are probably vaguely familiar with the most common type of CRS already, although perhaps not by name, known as a Geographic Coordinate Reference System, because it uses latitude and longitude coordinates to define specific points on the earth’s surface. It is more formally known as WGS 84. You might have even noticed that when you select a point in Google Maps, it automatically brings up the latitude and longitude coordinates of that location in a white box at the bottom of the window. It is through this system that we can relate my point-and-click to a real place on earth.


An example of latitude and longitude coordinates on the Google Maps online platform.

An example of latitude and longitude coordinates on the Google Maps online platform.


As we’ll find out later today, not all data you have collected or downloaded will use latitude and longitude coordinates. For example, lots of data released in Britain uses a projected CRS called the British National Grid, which uses Eastings and Northings to define locations in the British Isles based on a grid system, rather than longitude and latitude. It has some advantages, such as preserving shapes, and one can accurately calculate direction using the BNG. In fact, many areas of the world have their own projected CRS for similar reasons. It is beyond the scope of this course (and indeed, many GIS users) to discuss the merits and shortcomings of different CRS in detail. However, it is important to be aware of the CRS associated with your data, and to ensure that you are using the most appropriate one. Doing so will ensure that you are displaying information accurately, especially when overlaying multiple data sources. We went through this practically during the live demonstration of QGIS, but it will be covered again during the exercises later today, using both WSG 84 and the BNG.

If you want to read more about projections and CRS, you can read the excellent QGIS documentation online. There are also some useful resources made available by Data Carpentry. Alternatively, please feel free to ask me!


British National Grid nested grids. Souce: [Ordnance Survey](https://getoutside.ordnancesurvey.co.uk/guides/beginners-guide-to-grid-references/).

British National Grid nested grids. Souce: Ordnance Survey.


Now we are familiar with some GIS fundamentals, we can move on to opening up QGIS and exploring the interface.

QGIS interface

Quick tour

For these tutorials we will be using QGIS version 3.6.3 in order to match what is in the computer lab. If you are using your own laptop, you might have a different version, but it shouldn’t make too much difference. All previous releases of QGIS can be downloaded retrospectively from their website if you want the exact same version.

Basic QGIS interface.

Basic QGIS interface.


When you start-up QGIS, an interface resembling the above screenshot should open. There might be slight differences depending on whether someone has used QGIS on your laptop or computer before. The main window is the map view which visualises any spatial data you create or load into the software, so for now, it’s completely blank. On the left you will have the layers window, which provides a summary of the different “layers” of spatial data you are using. As we saw earlier, one of the most useful functions of QGIS is the ability to overlay different spatial data sources from the same area on top of one another. This window can be used to deselect layers, and change things like transparency to aid exploration of multiple layers simultaneously. At the top of the interface are the various tools available. Some of these have dedicated tab icons, but most functionality is available through the drop-down menus which we’ll explore later. The bottom of the interface includes live information about your map view, and importantly tells you what CRS is currently being used. In this case, our default is EPSG 4326 which is the official registry code for WGS 84, introduced above.

Plug-ins

A key benefit of open-source software like QGIS is the continuous development to functionality. One way in which QGIS benefits from this is through plug-ins which can be installed directly from within the software. Plug-ins have largely been developed by the QGIS community, and for that reason, they are often updated frequently with new tools and options. Before we get going with some data, we are going to install a plug-in called QuickMapServices which allows you to overlay Open Street Map base maps to your visualisations. First, navigate to the Plug-ins installation menu via the drop-down menu, search for the plug-in and install it, as demonstrated below.

<br> Step 1: Find the 'Manage and Install' option from the _Plug-ins_ drop-down menu


Step 1: Find the ‘Manage and Install’ option from the Plug-ins drop-down menu


<br> Step 2: Search for Quick Map Services and click 'Install Plugin'.


Step 2: Search for Quick Map Services and click ‘Install Plugin’.

The plug-in will now become available under the ‘Web’ drop-down menu. Don’t worry about using it yet, we will get into that in a minute! Let’s move on to our first exercise.

Exercise 1: tram stops

Raw data

To demonstrate some of the functionality in QGIS, we are going to use some data about tram stops on Greater Manchester’s Metrolink service. The data was compiled from some open government data and information about facilities available at each tram stop. Start by downloading this data directly as a .csv file and saving it in a folder on your machine. Explore it using Excel. It will look something like this:

<br> Data structure for trams_geo.csv.


Data structure for trams_geo.csv.

Each row is an observation, in this case, a tram stop in Greater Manchester, of which there are 93. Each column is a variable giving us additional information about each stop. These variables contain a fair bit of information, from the tram stop name, to the line its on, the number of cycle stands, blue badge parking spaces and whether it has lift access, and so on. Most importantly for us, there are two variables called eastings and northings respectively. So, we have the spatial location of each tram stop in the projected CRS of British National Grid. But although these coordinates are spatial information, telling us where each tram stop is located on the earth’s surface, Excel is just treating them like any other numeric variable. Using QGIS, we can convert this boring old spreadsheet into spatial data!

Creating point data

To make this conversion, we are going to add a new layer to our project in QGIS. We can do this using a specific option designed to pull out coordinates from a delimited text file using Layer -> Add Layer -> Add Delimited Text Layer on the drop-down menus.

<br> Adding a layer from a .csv file.


Adding a layer from a .csv file.

Bringing up this box will give you a series of options. First, we need to select the .csv file itself using the File name box by finding the file location on our local machine. Doing this will automatically fill in most of the remaining options and bring up a summary of how QGIS has read in the data, identifying the rows and columns. Often, you will have to manually select which columns represent which coordinates, and you will need to specify the CRS. There is a good chance that QGIS has actually done this for you. If not, we know that our coordinate columns are eastings (X field) and northings (Y field). We also know that, given that we have eastings-northings columns on locations in Britain, that the CRS will be the BNG with an EPSG code of 27700.

<br> Completing information needed to create spatial points from a .csv file.


Completing information needed to create spatial points from a .csv file.

Once you’ve completed this information, click Add and close the window, and there we have it! You are viewing the point locations of tram stops in Greater Manchester. You can navigate around this data by scrolling and click+dragging your mouse. The Layer window now also contains trams_geo. Just to make more sense of our introduction earlier, it’s worth mentioning that these points are vector data, a common way of representing specific pinpoint locations in QGIS. Each point has associated attribute data which was contained in the original .csv file. You can view it by clicking on the table symbol in the toolbar at the top of your interface. You’ll notice that this table is exactly the same as our original spreadsheet.

<br> Attribute table icon.


Attribute table icon.

Preliminary exploration

A good way to begin exploring data like this, either for interest or to identify interesting patterns, is by using the Properties... option accessible from each layer. You can access this option by right clicking on the name of the layer itself in the Layer window, in this case, trams_geo. It will bring up a window with lots of options down the left hand side, including basic information about the layer itself (e.g. the CRS), but also options to add information to your map using symbology and labels. There are endless options with these properties, many of which we’ll cover today, but for starters, let’s add some labels to our points so we can identify what is what. A basic single label can be used to display the stop variable (i.e. the name of the stop) for each point, as shown in the below screenshot. Feel free to make amendments to the text font, style and size as you see fit.

<br> Adding labels to points.


Adding labels to points.


Once you click Apply the labels will be added in the map view window. Now we can actually see which point corresponds to which tram stop.

<br> Map view of points with labels.


Map view of points with labels.


In a similar manner to how we have just linked the stop variable to each point to display information, we can use these preferences to change other visual features. An accessible and simple way to portray information about each point is to colour them by a variable using symbology, also available in the Preferences... window. Let’s colour each point according to the line on which the stop is situated. Because the variable containing this information line is discrete (i.e. categorical) we will replace the current basic single symbol with the Categorized option. Your choice in this drop-down menu very much depends on what feature or variable in the map you want to change.


<br> Choosing a categorized symbology.


Choosing a categorized symbology.

We can then select the Column we are interested in, which in this case is line, from the drop-down menu, and change colour ramp to apply to each category. Because lines are discrete and don’t have any inherent order, I will just keep it as random colours. Next, click Classify to generate the categories. Often, QGIS will by default create an ‘other’ category for observations which do not fall into any category (e.g. missings). You can remove this category by highlighting it, and clicking on the red minus sign. If you don’t like the colours created at random, or they are too similar to one another, you can right-click on each category and select Change Color.


<br> Colouring each point by the line variable, and creating a discrete categorisation. You can change each colour by hand.


Colouring each point by the line variable, and creating a discrete categorisation. You can change each colour by hand.


As before, clicking on Apply will you make these change in our map view. It now contains information on which stop is which, and makes a distinction between the different lines.

<br> Updated map view with colours points by the line variable.


Updated map view with colours points by the line variable.


The possibilities of symbology preferences are pretty expansive. We can visualise a continuous variable using the Graduated option (instead of Categorized, used above). For this example, let’s use the bb_spaces variable, which tells us how many blue badge parking spaces are available at each tram stop. Doing so will give us an indication as to which tram stations have more or less spaces, but it will also tell us whether there is a meaningful geographic distribution to these patterns. Have a go at this now, using the below example as a guide, making amendments to things like the number of classes, as you find appropriate. Note that we are choosing to size each point according to the bb_spaces variable.

<br> Changing our symbology preferences to size points according to the number of blue badge parking spaces.


Changing our symbology preferences to size points according to the number of blue badge parking spaces.


We can see that, not only is their variation between tram stops in the number of blue badge parking spaces, but there is a spatial patterning to these distributions, with the city centre having few spaces, and stops near the end of lines having many. It also helps us spot potential issues. It is unlikely that Manchester Airport has no blue badge spaces nearby, so such visual explorations can help us identify areas which demand further explanation. For example, perhaps the spaces are not near the tram stop itself, or are not free.

<br> Updated map view with points sized according to the number of spaces.


Updated map view with points sized according to the number of spaces.


To give a bit of local context to these maps, we can make use of the Quick Map Services plug-in we downloaded earlier to add an Open Street Map layer to our project. We can do this by selecting Web -> QuickMapServices -> OSM -> OSM Standard.

<br> Loading an Open Street Map layer.


Loading an Open Street Map layer.


We can alter the appearance of this layer using the preferences symbology. For the below map, the Open Street Map (OSM) layer has been made grayscale and the brightness is quite high, so that our tram stop points stand out. The label names have now been turned off because the OSM layer is now giving the context. We can now interactively explore the map to find explanations for the patterns observed, or perhaps to identify areas where city planners could improve accessibility for blue badge holders at tram stops.


<br> Updated map view with points sized according to the number of spaces, with an OSM layer.


Updated map view with points sized according to the number of spaces, with an OSM layer.


Spend some time trying out different labeling and symbology options on different variables to answer your own research questions. How do different tram stops fair when it comes to other forms of accessibility, such as ramps? How might we best visualise this?

Once you’ve had a good exploration of the tram data, feel free to move on to Exercise 2 in the next worksheet using police recorded crime data.